Short Message Spam Classification using Decision Tree, Naive Bayes, and Logistic Regression
##plugins.themes.bootstrap3.article.main##
Abstract
The increasing use of Short Message Service (SMS) in digital communication has been accompanied by a rise in spam messages, which threaten user convenience and information security. This study presents a comparative analysis of three classical machine learning algorithms—Decision Tree, Naïve Bayes, and Logistic Regression—for SMS spam classification. The research follows the CRISP-DM methodology, including data collection, understanding, preparation, modeling, and evaluation. The dataset used is the SMS Spam Collection (A More Diverse Dataset) from Kaggle, comprising 5,574 SMS messages labeled as spam or ham. Text preprocessing is performed through cleaning operations and feature extraction using the Term Frequency–Inverse Document Frequency (TF-IDF) method. The models are evaluated using accuracy, precision, recall, F1-score, and Area Under the Curve (AUC) metrics. Experimental results indicate that Logistic Regression achieves the most balanced performance, with an accuracy of 97.13%, precision of 99.23%, recall of 80.75%, F1-score of 89.04%, and an AUC of 98.72%. Naïve Bayes demonstrates high efficiency and perfect precision but lower recall, while Decision Tree offers interpretability with comparatively lower classification performance. The results suggest that Logistic Regression is the most suitable model for lightweight and reliable SMS spam detection systems, balancing accuracy and misclassification risk. This study provides practical insights for implementing efficient spam filtering solutions and serves as a reference for future research in text classification and natural language processing, particularly for short-message communication.
##plugins.themes.bootstrap3.article.details##

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish articles in CoreID Journal agree to the following terms:
- Authors retain copyright of the article and grant the journal right of first publication with the work simultaneously licensed under a CC-BY-SA or The Creative Commons Attribution–ShareAlike License.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).
References
S. K. D. Sharma, “A Comparative Study of Machine Learning Classifiers for Different Language Spam SMS Detection: Performance Evaluation and Analysis,” Advances in Artificial Intelligence Research, vol. 4, no. 2, pp. 69–77, 2024, doi: 10.54569/aair.1549781.
D. A. Oyeyemi and A. K. Ojo, “SMS Spam Detection and Classification to Combat Abuse in Telephone Networks Using Natural Language Processing,” Journal of Advances in Mathematics and Computer Science, vol. 38, no. 10, pp. 144–156, Oct. 2023.
S. Kaddoura, G. Chandrasekaran, D. E. Popescu, and J. H. Duraisamy, “A Systematic Literature Review on Spam Content Detection and Classification,” PeerJ Comput. Sci., vol. 8, p. e830, 2022.
M. Ahmadi and others, “Leveraging Large Language Models for Cybersecurity: Enhancing SMS Spam Detection with Robust and Context-Aware Text Classification,” arXiv preprint, vol. arXiv:2502.11014, 2025, [Online]. Available: https://arxiv.org/abs/2502.11014
M. R. Al Saidat, S. Y. Yerima, and K. Shaalan, “Advancements of SMS Spam Detection: A Comprehensive Survey of NLP and ML Techniques,” Procedia Comput. Sci., vol. 244, pp. 248–259, Jan. 2024, doi: 10.1016/J.PROCS.2024.10.198.
M. A. Abid, S. Ullah, M. A. Siddique, M. F. Mushtaq, W. Aljedaani, and F. Rustam, “Spam SMS filtering based on text features and supervised machine learning techniques,” Multimedia Tools and Applications 2022 81:28, vol. 81, no. 28, pp. 39853–39871, May 2022, doi: 10.1007/S11042-022-12991-0.
A. A. Dehghani, N. Movahedi, K. Ghorbani, and S. Eslamian, “Decision tree algorithms,” Handbook of HydroInformatics: Volume I: Classic Soft-Computing Techniques, pp. 171–187, Jan. 2023, doi: 10.1016/B978-0-12-821285-1.00004-X.
Y. Ying, T. N. Mursitama, Shidarta, and Lohansen, “Effectiveness of the News Text Classification Test Using the Naïve Bayes’ Classification Text Mining Method,” J. Phys. Conf. Ser., vol. 1764, no. 1, p. 012105, Feb. 2021, doi: 10.1088/1742-6596/1764/1/012105.
W. B. Zulfikar, A. R. Atmadja, and S. F. Pratama, “Sentiment Analysis on Social Media Against Public Policy Using Multinomial Naive Bayes,” Scientific Journal of Informatics, vol. 10, no. 1, pp. 25–34, Jan. 2023, doi: 10.15294/SJI.V10I1.39952.
M. F. Johari, M. A. A. Ghani, S. Z. M. Hashim, and A. M. Kamaruddin, “Key Insights into Recommended SMS Spam Detection Datasets,” Sci. Rep., vol. 15, no. 1, p. 8162, Mar. 2025, [Online]. Available: https://www.nature.com/articles/s41598-025-92223-1
H. Y. Aliza and H. Yasmin, “A Comparative Analysis of SMS Spam Detection Employing Machine Learning Methods,” Int. J. Comput. Appl., vol. 182, no. 20, pp. 15–21, Mar. 2022, [Online]. Available: https://www.researchgate.net/publication/359576155
T. E. B. Theodorus, Y. Yanuar, and R. D. Wulandari, “Short Message Service (SMS) Spam Filtering Using Deep Learning in Bahasa Indonesia,” in Proceedings of the 4th International Conference on Information and Communications Technology (ICoICT), Yogyakarta, Indonesia, 2021, pp. 104–109.
K. Ahluwalia, G. H L, R. R, and H. Lin, “Comparative Analysis of Various SMS Spam Detection Techniques Using Machine Learning Algorithms,” in Proceedings of the 13th International Workshop on Computer Science and Engineering (WCSE), 2023, pp. 87–92. [Online]. Available: https://www.wcse.org/WCSE_2023/021.pdf
N. Latifah, R. Dwiyansaputra, and G. S. Nugraha, “Multiclass Text Classification of Indonesian Short Message Service (SMS) Spam using Deep Learning Method and Easy Data Augmentation,” MATRIK: Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 23, no. 3, pp. 663–676, Jul. 2024, [Online]. Available: https://www.researchgate.net/publication/382758470
Q. Li et al., “A Survey on Text Classification: From Shallow to Deep Learning,” arXiv preprint, vol. arXiv:2008.00364, 2021, [Online]. Available: https://arxiv.org/abs/2008.00364
C. Schröer, F. Kruse, and J. M. Gómez, “A Systematic Literature Review on Applying CRISP-DM Process Model,” Procedia Comput. Sci., vol. 181, pp. 526–534, Jan. 2021, doi: 10.1016/J.PROCS.2021.01.199.
M. Irfan, T. V. Riyadi, A. R. Atmadja, R. S. Fuadi, and A. Muin, “Application of Convolutional Neural Network Algorithm for Analyzing Sentiments on the Kampus Merdeka Policy,” Proceeding of 2024 the 10th International Conference on Wireless and Telematics, ICWT 2024, 2024, doi: 10.1109/ICWT62080.2024.10674724.